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sing  new  chemical  procedures,  mass  spectrometric  Instrumentation  and 
appropriate  computerized  data  analysis,  the  diagnosis  of  a  number  of 
Infectious  diseases,  through  the  molecular  weight  profile  of  neutral 
metabolites  urine,  was  demonstrated.  Longitudinal  studies  on  human 
volunteers  Infected  with  sandfly  fever  showed  the  appearance  of  a 
characteristic  pattern  prior  to  the  onset  of  clinical  symptoms  which 
persisted  some  time  after  the  symptoms  have  subsided.  Tissue  cultures 
Infected  with  polio  virus  exhibited  a  characteristic  pattern  demonstrat 
able  within  6  hrs  from  Infection. 
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1.  INTRODUCTION 

This  is  the  second  annual  progress  report  on  a  research  program  entitled, 
"MASS  SPECTROfCTRIC  RAPID  DIAGNOSIS  OF  INFECTIOUS  DISEASES",  under  Contract 
No.  DAMD  1778D8035,  sponsored  by  the  Department  of  Army,  U.S.  Army  Medical 
Research  and  Development  Command,  Fort  Detrick,  Maryland.  This  progress 
report  covers  the  period  from  February  1,  1979  through  January  31,  1980. 

The  capability  of  making  a  rapid  and  reliable  diagnosis  of  infectious 
diseases  at  any  early  stage  and  at  low  cost  would  be  of  especially  great  value 
to  the  military  where  large  numbers  of  army  personnel  are  stationed  in 
confined  areas  and  their  continuing  health  is  crucial  to  carrying  out  their 
objectives.  Early  and  reliable  diagnosis  of  an  Infectious  disease  could 
prevent  the  spread  of  disease  to  large  groups  of  military  and  civilian 
personnel . 

Multicomponent  analysis  may  be  used  to  identify  in  the  host's  urine 
characteristic  metabolic  patterns  associated  with  infection  in  general,  with 
bacterial  or  viral  infection,  or  with  specific  infections.  The  multiscan  mass 
spectrometric  method  offers  four  types  of  uses  in  the  diagnosis  of  infectious 
diseases.  First,  multicomponent  analysis  by  mass  spectrometry  may  be  used  as 
a  diagnostic  tool,  allowing  diagnosis  of  an  infection  during  the  incubation 
period,  thus  facilitating  timely  isolation  and  appropriate  treatment.  Second, 
the  same  technique  may  be  used  for  the  identification  of  bacteria  and  viruses 
in  vitro.  Third,  the  characteristic  components  identified  by  the  pattern 
recognition  approach,  can  be  chemically  characterized  by  the  FI-CID  technique 
and  lead  to  an  understanding  of  the  biochemical  nature  of  the  host's 
reaction.  Fourth,  a  small  number  of  metabolites  identified  by  mass 
spectrometry  could  then  be  determined  by  non  mass  spectrometric  analytical 
techniques  (e.g.,  glc,  hplc,  or  specific  fluoromstric  ceterminsnts)  and 
aovantagcously  used  for  routine  early  diagnosis. 


During  the  second  year  of  this  second  phase,  we  have  achiev«l  a  number  of 
critical  objectives.  After  further  lirprovements  and  optimization  of  the 
sample  preparation  techniques  and  after  finding  optimal  conditions  for  mass 
spectrometric  analysis  (utilizing  a  double  focussing  configuration) ,  we  have 
devoted  a  substantial  effort  to  compare  and  critically  analyze  alternative 
statistical  multivariable  diagnostic  procedures.  However,  most  of  the  effort 
has  been  devoted  this  year  to  the  analysis  of  clinical  samples  and  thus  to  the 
evaluation  of  the  potential  usefulness  of  our  methodology  as  a  clinical 
diagnostic  technique.  First  we  have  demonstrated  a  high  degree  of  success  in 
separating  a  number  of  different  groups  of  patients  by  the  metabolic  profiles 
of  their  urines.  These  Included  patients  with  alcoholic  liver  disease, 
children  with  pneumonia  and  children  with  virus  induced  diarrhea;  the  patients 
could  be  readily  differentiated  by  comparison  with  healthy  subjects  of  the 
corresponding  age  group  with  practically  no  false-positives  or 
false-negatives.  Next  we  have  carried  out  a  longitudinal  study  on  urine 
samples  obtained  from  groups  of  volunteer  subjects  vaccinated  with  live  virus 
of  sandfly  fever  and  dengue  fever  and  followed  up  for  a  number  of  weeks. 
Although  this  study  is  not  yet  complete,  it  suggests  that  a  diagnostic  pattern 
associated  with  the  infection  can  be  detected  before  the  onset  of  clinical 
symptoms  and  it  subsides  after  their  disappearance.  Also  differences  in  the 
rate  of  inoividual  reaction  to  the  infection  could  be  shown. 

In  another  study  we  have  shown  that  the  molecular  weight  profile  of  a 
human  tissue  culture  medium  exhibits  significant  changes  when  the  cells  are 
infected  with  a  virus,  within  hburs  following  infection.  Human  lung  tissue 
cultures  infected  with  polio  mellitus  derronstrated  a  diagnostically  useful 
pattern  6  hours  follov/ing  exposure  to  the  virus.  Tnis  feasibility  stucy  is 
now  oeing  continuec  to  estabiisl-i  the  seeps  and  limitations  of  this  early 
deteetion  technique. 
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In  brief  during  the  second  year  of  this  project  preliminary  experiments 
have  shown  that  metabolic  profiles  of  the  host  can  be  used  to  identify 
Infected  subjects,  to  differentiate  between  patients  with  different 
infections,  to  detect  a  viral  infection  prior  to  the  onset  of  clinical 
symptoms  and  to  demonstrate  the  infection  some  time  after  the  clinical 
symptoms  have  subsided.  Next  we  have  shown  that  mass  spectrometric 
multicomponent  analysis  can  be  a  highly  useful  tool  in  the  clinical  laboratory 
by  detecting  the  presence  of  a  virus  through  its  effect  on  the  metabolism  of 
tissue  cultured  cells.  A  diagnostic  biochemical  pattern  seems  to  be 
distinguishable  within  a  few  hours  which  is  significantly  faster  than  by  the 
presently  used  morphological  changes.  The  coming  third  year  of  this  project 
will  be  devoted  to  substantiating  our  findings  on  human  host  reactions  and  on 
early  virus  detection  and  Identification  in  vitro. 

To  reiterate  the  above  in  a  more  explicit  manner,  during  1979  we  have 
accomplished  the  following  tasks: 

1.  Developed  and  tested  sample  sterilization  and  sitorage  procedures. 

2.  Developed  new,  simpler  sample  preparation  techniques  including  one  to 
handle  tissue  culture  media. 

3.  Improved  on  the  mass  spectrometric  procedure. 

4.  Improved  the  interfacing  with  the  INCOS-NOVA  and  the  CYBER  173  computers. 

5.  Tested  and  evaluated  different  statistical  analysis  classification 
procedures. 

6.  Analysed  numerous  samples  of  children  and  adults  with  different 
pathological  problems  and  demonstrated  the  efficiency  of  our  diagnostic 
procedure . 
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7.  Analysed  samples  of  series  of  samples  obtained  from  virus  infected 
volunteers  over  a  period  of  3  to  4  weeks  and  demonstrated  the  appearance 
of  a  pathological  pattern  which  disappeared  after  the  disappearance  of  the 
clinical  symptoms. 

8.  Analysed  media  of  human  tissue  culture  infected  with  polio  virus  6  and  24 
hours  following  exposure  to  the  virus  and  demonstrated  detectable  changes 
at  6  hours. 
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2.  EXPERIMENTAL  TECHNIQUES  AND  PROCEDURES 


The  following  Is  an  updated  description  of  our  experimental  techniques  and 


procedures.  Although  some  of  the  material  described  below  was  Included  in  the 


previous  report,  significant  charts  have  been  introduced  this  year  in  a 


number  of  phases  in  the  analytical  procedures  to  warrant  reporting  anew  the 


whole  analytical  procedure. 


A.  Sample  Collection  and  Storaqe  Procedure 


Urine  samples  were  collected  by  hospital  staff  in  12  ml  plastic  collection 


tubes  with  self-sealing  caps  ( Kova- Tubes  ”)  and  kept  frozen  until  analysis. 


As  an  enzyme  denaturant  bacteriocide  and  potential  virus  inactivator,  0.1  ml 


of  0.5  M  ZnSO^  was  placed  in  tubes  given  to  the  hospital  or  clinic.  This 
gives  a  final  concentration  of  0.005  M  ZnSO^  in  the  collected  urine.  Both 
ZnSO^  and  HgCl2  were  recently  tested  by  Or.  Howard  Faden  of  the  Childrens 
Hospital  for  their  ability  to  inactivate  polio  viruses  (Mahoney  polio 


strain).  The  viral  solution  was  supplemented  with  one  or  the  other  of  the 


salts  and  then  treated  with  EOTA  to  remove  excess  metal  ions  that  were 


themselves  cytotoxic.  Viral  solution  treated  with  5  mM  HgCl2  did  not  cause 
Infection  when  added  to  a  culture  of  human  embryonic  lung  tissue.  On  the 


other  hand,  concentrations  of  up  to  10  millimolar  ZnSO^  were  not  effective 
in  inactivating  the  virus. 


^mples  (5  ml  aliquots)  are  thawed  and  0.005M  ZnSO^  is  added  to  samples 


that  were  collected  without  this  "preservative".  The  presence  of  zinc  was 


found  to  have  a  minimal  but  detectable  effect  on  the  molecular  weight  profile, 


so  all  samples  that  belong  to  ongoing  series  are  presently  processed  with  this 


reagent  added.  In  the  future  HgCl2  will  be  substituted  for  ZnSO^  in  this 


protocol.  After  reaching  room  temperature,  an  equimolar  amount  of  EDTA  is 


added  to  chelate  the  excess  zinc  ions,  and  the  pH  of  the  sample  is  adjusted  to 


7.2  with  HCl  or  NaOH.  Chelation  of  the  excess  zinc  with  EDTA,  avcios 


precipitation  of  Zn(0H)2  during  pH  adjusteents.  Such  precipitation  could 
potentially  alter  profile  patterns  by  co>precipltating  organic  constituents  in 
the  urine.  A  titration  curve  of  urine  indicate  that  pH  values  le^  than  2« 
near  7.2  ahd  above  10  are  likely  to  have  reproducible  profile  patterns.  In 
Intermediate  regions  there  appear  to  be  partially  ionized  constituents  having 
extraction  yields  that  are  highly  pH  dependent.  In  a  separate  series  of 
experiments,  using  the  Wilcoxon-iiMI  test  to  evaluate  the  results,  it  was 
determined  that  insignificant  pattern  changes  occurred  within  a  pH  range  of  ± 
0.5  pH  unit  of  7.2. 

p 

Droplets  of  thawed  urine  are  also  tested  using  Ames  Multistix  reagent 
strips  to  assay  pH,  protein,  blood,  glucose,  ketone  bodies,  bilirubin  and 
urobilinogen.  These  results  are  recorded  for  correlation  with  hospital 
testing  of  the  same  sample,  and  to  indicate  samples  that  should  be  excluded 
from  data  sets  on  the  basis  of  abnormal  kidney  function. 

We  have  also  initiated  a  study  on  the  effects  of  various  means  of  sample 

storage.  This  study  is  outlined  in  Figure  1.  A  volume  of  urine  was  collected 

from  a  healthy  adult  male.  This  volume  was  divided  into  5  ml  aliquots  that 

were  alternatively  refrigerated,  frozen  and  left  standing  at  room  temperature 

TH 

without  preservative,  In  capped,  12  ml  plastic  Kova  Tubes  .  Three  samples 
were  prepared  immediately  from  the  fresh  urine.  Samples  stored  under  the 
conditions  outlined  in  Figure  1  were  sii3sequently  prepared  in  triplicate  by 
the  same  method. 

Two  different  methods  were  employed  to  defrost  the  frozen  samples.  At 
each  time  Indicated,  three  tubes  were  allowed  to  defrost  by  standing  at  room 
temperature  for  approximately  45  minutes.  Three  other  tubes  were  core  rspi-'ily 
thawed  by  holding  them  under  running  water.  This  study  is  still  in  progress 
and  the  results  will  be  reported  in  next  year's  report. 
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B.  Sample  Preparation  Procedure 

Samples  are  applied  to  the  top  of  a  6  mm  O.D.  x  25  cm  glass  column 
containing  18  cm  of  prewashed  Chromosorb  P  above  2.5  cm  of  Na2S0^  and  a 
glass  wool  plug.  The  column  outlet,  a  10  cm  length  of  0.25  mm  I.O.  stainless 
steel  capillary  tubing,  and  a  solvent  reservoir  of  12  cm  O.D.  x  100  cm  glass 
tubing  attached  to  the  top  of  the  column,  are  assembled  using  stainless  steel 
swagelok  unions  and  teflon  ferrules  (see  Fig.  2).  The  sample  is 
applied  to  the  column  using  nitrogen  pressure  supplied  through  a  swagelok 
fitting  at  the  top  of  the  column.  The  Na2S0^  serves  to  retain  any  aqueous 
sample  that  emerges  from  the  chromosorb.  After  the  sample  is  adsorbed,  5  mis 
of  dichloromethane  are  added  to  the  reservoir  and  forced  through  the 
Chromosorb  column  with  N2  pressure  at  a  flow  rate  of  0.5  ml/min.  The 
emerging  eluate  is  continuously  absorbed  onto  a  1  mm  x  2  cm  strip  of  Whatman 
GP/A  glass  fiber  filter  paper  at  the  bottom  of  a  conical  tube.  The  eluate  on 
the  paper  is  continuously  concentrated  by  a  stream  of  dry  N2  directed  toward 
the  bottom  of  the  collection  tube.  The  dried  eluate  is  stored  in  a  glass  vial 
until  mass  spectrometric  analysis.  The  filter  paper  replaces  a  micro  column 
of  chromosorb  previously  used  (see  last  year's  report).  The  glass  wool  plugs 
of  those  sample  columns  frequently  loosened  and  resulted  in  sample  loss  during 
loading  into  the  mass  spectrometer  inlet  probe.  The  glass  fiber  paper  has  a 
lower  background  than  the  chromosorb  column  and  gives  evaporation  profiles 
similar  to  those  obtained  with  the  micro  columns. 

C.  Computerized  Data  Acquisition 

We  have  successfully  interfaced  each  of  our  spectrometers  to  the  FINNIGAN 
2400  data  system.  The  data  system  generates  a  digital  scan  function  with 
operator  selectable  scan  times  and  upper  and  lower  mass  limits.  This  digital 
scan  function  is  applied  to  a  16-bit  D/A  converter  to  provide  as  analog  signs! 
to  drive  the  spectrometer  magnet.  For  use  with  the  double  focusing  CIO 
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instrument  this  analog  signal  is  applied  to  a  current  programmable  magnet 
power  supply  (Alpha  Scientific  Model  3048)  using  an  Analog  Device  model  AD284J 
isolation  amplifier.  This  configuration,  computer  generated  digital  scan,  D/A 
and  current  programmable  supply,  gives  highly  reproducible  scans  on  each 
instrument  to  which  the  computer  is  interfaced.  For  field  ionization 
operation,  the  accelerating  voltage  is  monitored  with  a  4  1/2  digit  digital 
voltmeter  and  maintained  at  the  same  value  (nominal  5000V)  at  the  beginning  of 
each  sample  run  to  provide  the  same  mass-to-time  function  for  each  analysis. 
The  detector  output  is  sampled  by  a  12  bit  A/D  converter  with  computer 
controlled  integration  times  of  25  to  200  microseconds  (see  Fig.  3).  Mass 
assignment  is  achieved  by  a  time-to-mass  calibration  curve  established  with  a 
known  mixture  of  reference  materials.  The  calibration  algorithm  utilizes  a 
higher  order  curve  fitting,  capable  of  accurate  interpolation  and 
extrapolation  for  mass  assignment,  provided  the  spectrometer  scan  function  is 
reproducible.  In  our  system  the  digital  scan  function  and  the  stable  current 
programmed  power  supply  gives  mass  assignment  stability  within  0.3  amu  from 
sample  to  sample,  including  drifts  due  to  the  turning  on  and  off  the 
accelerating  voltage  supply.  The  magnet  is  continuously  and  repetitively 
scanned  by  the  computer  during  the  day,  and  acquisition  of  data  is  Initiated 
by  operator  command  through  the  data  system  display  terminal.  Mass  assignment 
drifts  of  less  than  1  amu  are  generally  maintained  over  several  days. 

In  spite  of  the  inherent  mass  assignment  reproducibility  we  recalibrate 
the  instrument  for  each  sample.  Calibration  for  field  ionization  is  more 
difficult  than  with  electron  impact.  Due  to  the  absence  of  fragment  ions,  we 
must  calibrate  with  a  mixture  of  compounds  with  similar  vapor  pressures,  in 
order  to  obtain  spectra  with  all  calibration  peaks  present  in  a  single  scan. 
Presently  we  use  a  mixture  of  seven  compounds  (see  Figure  4)  covering  a 
molecular  weight  range  for  m/e  73  to  m/e  298.  Using  an  instrutnent  with  a 
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FIGURE  4 
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nominal  resolution  of  500,  the  calibration  with  this  mixture  is  generally 
accurate  to  0.05  amu  at  mass  300.  Assignment  of  masses  outside  the  73-298  amu 
range  is  by  extrapolation  of  the  computed  calibration  curve.  Extrapolated 
assignment  of  masses  up  to  100  amu  above  m/e  298  is  accurate  and  stable  to  + 
0.2  amu.  This  is  measured  by  observing  the  assignment  of  a  high  M.W.  peak 
(e.g.  cholesterol  MW  386)  using  several  different  calibration  curves. 

For  analysis  of  urine  extracts  the  mass  spectrometer  is  scanned  from  1  to 
450  amu  in  12  seconds.  The  total  scan  cycle  time  is  17  seconds  including  the 
time  spent  returning  to  the  starting  mass,  and  allowing  for  field 
stabilization.  Samples  are  placed  in  the  solid  probe  which  is  placed  in 
contact  with  the  ion  source.  Data  acquisition  starts  as  soon  as  the  sample  is 
in  position  near  the  ion  source.  The  sample  is  maintained  at  20°C  using  air 
cooling  of  the  probe  during  sample  introduction,  and  following  initial  contact 
with  the  ion  source  which  is  maintained  at  200°C.  The  probe  temperature  is 
linearly  programmed  for  20°C  to  200°C  increase  over  20  min,  and  held  at 
200°C  for  5  minutes. 

The  data  system  records  and  stores  90  individual  mass  scans  during  the 
sample  volatilization.  These  scans  may  be  displayed  in  graphical  or  tabular 
format  curing  acquisition.  The  data  is  stored  in  an  accurate  mass  format  with 
the  mass  of  each  peak  assigned  to  within  +  60  ppm  by  the  current  calibration 
file.  At  the  end  of  the  acquisition  period,  the  Individual  scans  are  summed 
following  conversion  of  the  accurate  masses  to  nominal  masses,  and  this 
integrated  spectrum  is  used  in  all  subsequent  data  processing  steps. 
Immediately  following  the  urine  sample,  a  new  calibration  sample  is  introduced 
and  10  scans  acquired  over  a  temperature  increase  from  room  temperature  to 
approx im,ately  70°C.  Due  to  the  high  volatility  of  some  components  in  the 


lew  mclecular  weight:  rar  lerue  c.'.eoce5  1.;  the  relative  peak  intensities 


occur  curing  the  calibration  run,  and  only  the  scans  which  have  sufficient 
intensity  of  all  coriponents  of  the  m'/ture  are  utilized  for  calibration. 
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The  drift  in  mass  assignment  of  one  of  the  components,  methyl  sterate,  MW 
298.28,  is  compared  between  this  and  the  previous  calibration  file.  This  peak 
has  the  smallest  intensity  and  the  highest  mass  in  the  mixture,  so  that  its 
mass  assignment  is  subject  to  the  largest  variation  dLie  to  ion  statistics  and 
scan  irreproducibility.  Differences  in  mass  assignment  for  two  consecutive 
calibration  runs  are  typically  within  0.1  amu. 

After  calibration  the  source  and  probe  are  baked  out  to  remove  any 

residue,  in  preparation  for  the  next  sample.  Ihe  entire  sequence  of 

calibration,  bakeout,  and  sample  analysis  requires  about  one  hour.  At  least 

once  each  day  the  sensitivity  of  the  ion  source  is  measured  by  evaporating  1 

microgram  of  adenine,  and  acquiring  data  with  the  same  scan  sequence  used  for 

urine  samples.  The  Integrated  signal  for  this  sample  is  required  to  be 

greater  than  5000  ions.  Correcting  for  the  percentage  of  scan  time  actually 

spent  monitoring  the  molecular  ion,  (10  ms/17  sec)  this  is  equivalent  to 
-12 

10  Coulombs/microgram.  Over  the  actual  scan  range  employed  this  implies 
a  minimum  of  100  ions  collected  for  20  ng  of  a  single  component  in  an  actual 
sample  mixture.  Examples  of  calibration  and  sample  data  are  given  in  Figs. 
5-9. 

A  standard  urine  extract  is  also  analyzed  at  least  once  a  week  as  an 
overall  test  of  instrument  performance.  This  sample  is  a  10  microliter 
aliquot  of  a  of  a  concentrate  obtained  from  the  extraction  of  20  ml  of  urine 
using  a  larger  column  appropriately  scaled  up  in  size  to  cope  with  a  bulk 
sample. 

Initially,  profiles  were  recorded  using  a  single  focusing  magnetic  sector 
spectrometer  (see  last  year's  report).  In  certain  samples,  particularly  from 
patients  with  liver  disorcers,  we  encountered  mass  regions  where  ions  apcor:ed 
with  a  contxnu’jm  of  enorgieo  oovciir.r,  oo.r-.-ol  err.u  of  the  mass  scale.  Furt-  cr 
analysis  revealed  tiist  many  of  these  ions  possessed  energy-to-ch.a'^ge  ratics 


T; 


5  14:15.00  - 

5»*  .HP'  L.P,‘  ■  AT— t'  r  Osn-PiK'^ 

iQtel 

1 

< 

K 

5 

•JAO 

1.'. 

C.  00 

«>, 

if  0 

hlJsJhA  ■  hf*N 

miL'i: 

1 

hASS 

>.  RIC  • 

X  RA 

y,  Rtc 

INT 

1«.  00> 

:>  97 

0.  41 

179S. 

loV.  OO 

89.  32 

12.  32 

540 

29.  OU  :■ 

0  63 

0.  11 

503. 

168.  00 

9.  59 

1.  32 

58 

i 

43.  XV. 

3.  48 

0.  48 

2108 

169.  CO 

1.  55 

0.  21 

9 

45  00'? 

2.  05 

0.  25 

1242 

170.  Off 

O.  68 

0.  0? 

4 

5V.  00  ? 

0.  70 

0.  10 

424. 

171.00 

0.  60 

0.  08 

( 

59.  00'* 

1.  62 

0.  22 

979. 

175.  OO 

0.  88 

0.  12 

s: 

60  00? 

3.  76 

0.  52 

2276. 

176.  00 

3.  03 

0.  42 

18: 

61.  00 

3.  52 

0.  49 

213.2 

177.  OO 

1.  18 

0.  16 

7 

( 

64.  00'’ 

1 . 79 

0.  25 

1084. 

179.  OO 

1.  22 

0.  17 

7; 

73.  00 

0.  92 

0.  13 

556. 

180.  OO 

100.  OO 

13.  SO 

605^ 

75.  00 

0.  80 

O.  11 

4sr>. 

181.  00 

16.  86 

2.  33 

.102< 

( 

73.  00 

0.  80 

0.  11 

4S6 

182.  OO 

2.  26 

0.  31 

13^ 

84.  00 

•3.  63 

0.  50 

2200. 

183.  OO 

0.  60 

0.  03 

3^ 

85.  00 

0.  69 

O.  10 

417. 

184.  OO 

20.  35 

2.  81 

123( 

( 

94.  00 

.0.  59 

0.  03 

35a 

185.  OO 

e.  26 

1.  14 

50( 

95.  00 

0.  91 

0.  13 

552 

189.  OO 

0.  66 

0.  09 

3*] 

99.  00 

J.  94 

0.  27 

1176 

192.  OO 

8.  5.6 

1.  IS 

Sin 

( 

100.  CO 

1.  16 

0.  16 

700. 

193.  00 

4.  02 

0.  56 

24: 

108.  00 

0.  89 

0.  12 

537 

194.00 

43.  71 

6.03 

2646 

109.  00 

0.  90 

0.  i2 

544. 

195.  OO 

6.  43 

0.  09 

389 

( 

110.  00 

1.  :>3 

0.  21  ■ 

926. 

196.  OO 

2.  14 

0.  16 

6E 

1 12.  00 

1.  IS 

0.  16 

694. 

20S.  00. 

1. 23 

0.  17 

74 

112.  00 

0.  72 

0.  10 

433. 

210.  OO  ■ 

‘O  64 

0.  09 

38 

• 

114.  00 

1.  51 

0.  21 

912. 

220.  00 

1.  13 

0.  16 

68 

117.  00 

0.  07 

0.  12 

528. 

226.  00 

1.  60 

0.  22 

96 

118  00 

0.  56 

0.  08 

342. 

248.  00 

1.  10 

0.  15 

66 

I 

122.  00 

0.  91 

0.  13 

553. 

274.  00 

4.  96 

0.  63 

300 

123.  CO 

0.57 

0.  09 

346 

275.  00 

1.  15 

0.  16 

69 

.124.  00 

1.  95 

0.  27 

1  J  £J0. 

288.  OO 

0.  89 

0.  12 

53 

; 

125.  CO 

IS  50 

2.  55 

11*200. 

290.  00 

0.  63 

0.  09 

41 

•126.  00 

7.  7S 

1. 07 

4712 

291.  00 

0.  99 

0.  14 

60: 

127.  00 

0  39 

0.  12 

536. 

292.  00 

43.  45 

5.  99 

2630i 

.028.  00 

1.  36 

0.  19 

821. 

293.  OO 

9.  00 

1.  24 

5441 

129.  00 

0.  75 

0.  10 

455 

294.  00  • 

0.  63 

0.  09 

382 

131. 00 

0.  58 

0.  08 

349. 

301.  009 

0.  58 

0.  05 

35S 

135.  CO 

0.  61 

0.  08 

372. 

302.  00? 

tl.  53 

1.  60 

700S 

137.  CO 

14.  24 

1.  97 

8624. 

303.  OO? 

.2.  68 

0.  37 

162^ 

138.  00 

B.  09 

1.  12 

4396. 

304.  00? 

42.  60 

5.  03 

2579; 

139.  00 

J.  74 

0.  24 

1054 

305.  00? 

7.21 

1.  00 

4365 

14.S.  00 

0.  81 

0.  11 

493. 

306.  00? 

43.  V6 

6.  04 

2649^ 

•147.  00 

0.  61 

0.  08 

370 

307.  00? 

6.  83 

0.  94 

413^ 

X  48.  00 

0  57 

0.  OS 

,34.1.. 

340.  00? 

5.  93 

0.  S3 

3£>2-: 

131.  o-:  - 

0.  90 

0,  12 

542. 

341. 00? 

2.  4.3 

0.  24 

147C 

'.*1) 

43.  (16 

5. 

25952 

356. 

1.  3?J 

0.  ts 

77} 

-.O 

V  70 

1  06 

46  . 

35  V  oc^> 

0  9;: 

0.  3? 

07} 

.  :  :  '  / 

.3,  !.:> 

O  1 .'} 

1.  'V.-* 

0.  » 

L  “y 

0  :? 

1. 'j')' 

0  ‘  6 

0  '  •.  •• 

452 

:  .  ^ .  *} 

0  O'- 

.f^r  (■  ••• 

■’  1 

0  ! 

■s3: 

-  T/  .» 

'1 ,  /  si 

*.  *  '’•y* 

T  t'  ' 

•!ii  • 

1  :.•! 

0  7- . 

•  <  r  ^ 

w' .  v/ *7 

-i  J 

•V 

C  ■ ' 

**•  *  ^  *  *  « 

.  •  •  •  •  ^ 

1^;.  •  ■ 

0.  I  t? 

79'.'. 

Ml.?.  •  ••;  ■ 

1  V4 

0 

3  «  i£ 

r*  t  1 1  n  I 


SCAM 


f?tr  A  TMTr»»o»TirA  cb^ataa  ne 


greater  than  the  main  beam  energy.  These  ions  are  a  result  of  the  formation 
of  doubly  charged  species,  which  upon’  collision  with  neutral  molecules  either 
gain  an  electron  or  decompose  to  smaller  fragments.  In  our  Instrument  the 
mdst  likely  place  for  such  collisions  to  occur  is  in  the  lens  between  the 
source  and  the  object  slit.  This  region  has  the  highest  density  of  neutral 
molecules  as  well  as  of  focused  deccelerated  ions.  The  distribution  of 
energies  arises  from  the  fact  that  these  processes  may  occur  anywhere  between 
the  ionizer  and  the  object  slit  in  a  region  with  a  potential  gradient.  The 
final  energy  of  the  ion  will  be  a  function  of  the  position  where  the 
transition  from  doubly  charged  to  singly  charged  species  occurred.  The 
acceleration  of  the  ion  as  a  doubly  charged  species,  even  over  a  small 
fraction  of  the  net  accelerating  voltage,  will  result  in  an  ion  with  a  final 
energy  greater  than  the  main  beam  energy.  In  order  to  obtain  unit  mass 
resolution  of  the  singly  charged  normal  molecular  ions,  a  double  focussing 
instrument  employing  an  energy  analyzer  is  required.  Since  the  extent  of 
these  processes  is  variable  and  may  occur  in  any  sample,  we  now  analyze  all 
samples  on  the  reverse  geometry  double  focussing  instrument  (the  CID 
instrument)  described  in  our  original  proposal. 

D.  Data  Reduction  and  Transmission 

The  mass  spectra  acquired  are  processed  locally  to  facilitate  transmission 
to  the  University's  main  computer  (Control  Data  Corporation  CYBER  173).  A 
second  set  of  operations  in  the  CYBER  then  follows,  and  results  in  spectral 
data  suitable  for  the  subsequent  statistical  operations. 

Within  our  dedicated  Finnigan/INCOS  computer  system  the  individual  mass 
spectrometer  scans  are  stored  as  acquired.  Upon  completion  of  the  run  a 
single  spectruni  Is  produced  by  adcition  of  all  scans.  This  spectrum  is  them 
t vsns;  j*..  d  a  rO.T orsn-roouaPl"  for  tronrmisoicn  f?  "'-e 

CYt£R  main  computer  cve-r  a  1200  bund  multiplexor  pnone  link. 


When  the  spectra  are  on  file  in  the  CYBER,  format  and  simple  logical 
checks  are  made  on  the  data,  which  are  then  corrected  as  required.  Each 
spectrum  is  then  normalized  to  unit  total  area,  with  individual  peak  areas 
greater  than  5  per  cent  of  the  total  excluded  from  the  normalization  sum  )fpr 
tje  ratopma;e  see  last  year's  report).  All  statistical  analysis  is  then 
accomplished  using  these  normalized  spectra. 


3.  DIAGNOSTIC  STATISTICAL  ANALYSIS. 

A.  Introduction 

During  1979  we  have  devoted  a  substantial  effort  to  evaluate  our 
diagnostic  statistical  analysis  comparing  a  number  of  alternative  techniques 
for  the  selection  of  variables  and  for  the  separation  of  cases  into 
diagnostically  meaningful  groups. 

The  statistical  analysis  in  this  project  has  the  following  objectives: 

a.  To  separate  the  biological  samples  (cases)  into  statistically 
distinct  groups  correlated  with  the  disorder  of  interest  (e.g. 
healthy  vs.  diseases,  healthy  vs.  alcoholic  liver  disease,  pneumonia 
vs.  bronchitis,  bacterial  vs.  viral  pneumonia,  etc.)  by  a 
characteristic  set  of  variables. 

b.  To  assign  correctly  an  unknown  case  to  one  of  a  number  of 
pre-specified  groups,  which  have  been  previously  developed  on  the 
basis  of  a  continuously  increasing  learning  set. 

c.  To  identify  the  variables  that  best  characterize  a  given  pathological 
state,  in  order  to  facilitate  the  understanding  of  the  biochemical 
nature  of  the  disease  and  possibly  also  in  order  to  explore  the 
possibility  of  cebr.tiistive  assay  of  tne  particular  rretebalite  by  a 
simpler  r'sn-f:.,es  r; '''.t; sr' ■' trie  s''.ly'.:csl  tecrinicue. 
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d.  To  identify  variables  that  co-vary,  for  two  reasons  -  first  to  attain 
a  better  understanding  of  the  biochemical  nature  of  the  disease,  and 
second  to  minimize  undue  weighting  bias  in  the  differentiation  of 
cases  into  diagnostic  groups. 

Different  multivariate  analysis  techniques  have  provided  us  with  answers 
to  these  questions  with  different  degrees  of  efficiency.  As  stated  above  we 
have  started  to  compare  different  statistical  techniques  for  their  merits  in 
meeting  our  objectives.  In  the  following  sections  we  shall  discuss  the 
current  status  of  this  evaluation,  which  will  be  continued  during  the 
forthcoming  stages  of  this  project. 

We  may  separate  our  current  statistical  analysis  into  two  phases  -  the 
selection  and  rating  of  variables  (mass  spectral  peaks)  according  to  their 
diagnostic  value,  and  the  classification  of  cases  (patient*  urine  or  tissue 
culture  media)  into  groups. 

B.  Selection  and  Rating  of  Variables 

Each  normalized  mass  spectrum  comprises  hundreds  of  variables  (peaks)  each 
of  which  represents  the  concentration  of  a  metabolite  (or  a  group  of 
metabolites  sharing  the  same  nominal  mass)  in  the  biological  sample.  When  we 
compare  the  magnitudes  of  a  given  variable  in  samples  coming  from  two 
biochemically  distinct  groups  we  observe  three  types  of  variation: 

-  variation  due  to  the  analytical  procedure 

-  variation  due  to  biological  variance  (due  to  genetic  or  nutritional 
factors),,  and 

-  variation  associated  with  the  experimental  difference  between  the 
groups,  e.g.  variation  due  to  the  pathological  status  of  a  human 
subject,  or  the  infected  status  of  a  tissue  culture. 


25 


Ideally  there  should  be  no  variation  of  the  first  kind  and  the  variation 
due  to  the  pathological  status  should  be  by  far  larger  than  the  biological 
variation.  In  reality,  however,  the  majority  of  variables  show  a  large 
biological  variation  (on  top  of  a  finite  experimental  variance)  and  are  thus 
of  minimal  diagnostic  value  (we  shall  call  these  diagnostically  useless 
variables).  Our  problem  is,  therefore,  to  select  those  variables  which  may  be 
diagnostically  useful . 

Any  statistical  diagnostic  procedure  will  become  ineffective  if  given  an 
excessive  number  of  diagnostically  useless  variables,  even  in  the  presence  of 
useful  ones.  On  the  other  hand,  we  would  like  to  utilize  every  useful 
variable  since  each  of  these  increase  the  diagnostic  power  of  the 
classification  procedure.  An  acceptable  variable  selecting  procedure  has, 
therefore,  to  Identify  and  reject  useless  variables,  while  retaining  .  j.1  the 
useful  ones.  Moreover  since  there  will  always  be  va4:i;r;bles  "useful"  than 
others,  an  adequate  procedure  should  rate  them  accordingly,  and  thus  allow  us 
to  optimize  the  diagnostic  power  of  the  diagnostic  procedure. 

There  is  another  factor  that  should  be  taken  into  account  -  covariance. 

In  a  biological  system  there  are  many  variables  that  are  biochemically 
interrelated,  so  that  their  variation  associated  with  a  given  pathological 
state  is  interdependent.  If  such  variables  are  used  in  a  statistical  method 
based  on  a  pattern  of  a  given  number  of  independent  variables,  they  may  bias 
the  result  by  giving  an  undue  high  weight  to  a  single  variation  (accompanied 
by  a  set  of  dependent  variables).  It  would  be  advantageous,  therefore,  to 


identify  such  co-variances  and  eliminate  the  satellite  varieties  frcm  the 
diagnostic  classification  pattern.  A  C's-irabls  feature  of  a  self3Cting 
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This  non-parametric  ranking  of  variables  according  to  the  probability  of 
being  constituents  of  the  same  population  has  been  described  in  our  previous 
reports.  This  ranking  has  two  important  shortcomings:  first,  it  does  not 
identify  covariance  and  second,  it  will  not  discriminate  against  artifactual 
deviants.  In  fact  this  treatment  may  give  a  variable  (peak)  with  an  unlikely 
large  deviation  in  one  of  the  cases  (spectra)  an  undue  outstandingly  low 
probability.  This  second  shortcoming  is  avoided  in  the  t-test  variable  rating. 

B2.  The  t-test  rating 

We  have  applied  the  t-test  program  P3D  of  the  BKCP  programs  package  (UCLA 
1977)  to  determine  the  null  hypothesis  that  each  of  the  variables  in  two 
tested  groups  of  cases  belongs  to  the  same  population.  Unlike  in  the  Wilcoxon 
test  this  probability  is  calculated  on  the  basis  of  deviations  from  the 
group's  average,  while  taking  into  account  the  difference  in  values  between 
the  two  group  averages.  This  program  also  provides  us  with  the  variance  of 
each  variable  in  each  group,  so  that  artifactual  deviants  can  be  readily 
identified  and  discounted. 

In  spite  of  the  intrinsic  differences  between  the  Wilcoxon  and  the  t-test, 
the  two  programs  identified  50  out  of  AGO  peaks  from  the  same  test  set  of 
spectra  as  of  prime  diagnostic  value  with  an  overlap  of  over  90%  of  the 
variables  selected,  (although  the  order  of  ranking  by  the  two  procedure  was 
somewhat  different).  In  view  of  this  finding  and  since  the  t-test  provides 
additional  useful  information  v.i  prefer  now  to  use  this  program  for  selection 
of  the  diagnostic  peaks. 
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B3.  The  Stepwise  Discriminant  Analysis  Procedure. 

This  procedure  to  be  discussed  below,  selects  and  rank  orders  variables 
according  to  their  "F”  values.  The  F  value  for  each  variable  is  proportional 
to  the  square  of  the  intergroup  difference  and  inversely  proportional  to  the 
square  of  the  intragroup  variance  around  that  group's  average.  Since  this 
procedure  requires  considerably  n«3re  computer  capacity  than  the  two  preceding 
methods,  it  can  handle  just  a  limited  number  of  variables  (50  peaks  in  our 
case).  This  limitation  requires  pre-selection  of  variables  by  one  of  the 
preceding  techniques,  to  be  followed  by  their  F  value  ranking  according  to 
their  usefulness  in  separating  the  cases  into  distinct  groups.  The  main 
feature  of  this  statistical  procedure  is  its  ability  to  identify  and  reject 
co-variants.  In  spite  of  this  inportant  advantage  the  discriminant  analysis 
can  hardly  be  considered  a  practical  peak  selecting  procedure,  because  of  its 
high  demand  on  computer  time  and  capacity.  However  since  this  technique  is 
being  used  as  a  group  classification  and  case  assignment  procedure,  the  peak 
selection  and  ranking  according  to  the  F  values  may  be  considered  as  a  fringe 
benefit. 

B4.  Modes  of  use  of  selected  variables 

The  variables  for  a  given  classification  procedure  can  be  selected  by 
virtue  of  meeting  a  certain  arbitrary  criterion  (e.g.  having  a  p  value  below  a 
given  value),  or  by  rank  ordering  according  to  a  given  criterion  (e.g. 
starting  with  the  variable  of  lowest  p  value,  followed  by  the  next  lowest,  and 
so  on)  and  then  picking  an  arbitrary  number  (e.g.  5C)  having  the  lowest  p 
values.  The  seiecifed  variables  can  then  be  used  in  the  group  ciassificstic-, 
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Alternatively  a  characteristic  classification  parameter  (e.g.  the  p  value) 
can  be  used  as  a  weighting  factor.  Since  the  diagnostic  usefulness  Increases 
as  p  decreases,  using  1/p  as  a  weighting  factor  is  perhaps  the  sinfplest 
weighting  procedure.  This  weighting  will,  however,  give  variables  with  very 
small  p  values  a  very  high  wei^t.  Alternative  weighting  factors  could  be  for 
instance  1/^^  ,  or  log  1/p,  which  would  decrease  the  overweighting 

of  variables  with  very  small  p  values. 

There  are  advantages  to  either  peak  selection  procedure.  The  discriminant 
method  (by  an  arbitrary  cut-off)  requires  human  judgement  for  each  set  of 
cases.  This  shortcoming  can  be  eliminated  by  rank  order  cut-off.  Using  the 
t-test  one  can  use  the  variance  in  addition  to  the  t  or  p  values  as  a  second 
criterion  in  selection. 

The  weighting  procedure  while  free  from  subjective  intervention 
nevertheless  requires  optimization  to  obtain  the  best  use  of  separating 
variables.  However,  the  weighting  of  variables  as  in  the  WNI  procedure  (see 
our  1978/9  report)  involves  the  choice  of  an  arbitrary  weighting  function 
which  is  at  best  a  compromise  between  an  optimized  use  of  the  variables  and  a 
use  based  on  a  subjective  threshold. 

C.  Group  Classification  Procedures 

During  the  previous  phase  of  this  program  we  have  used  basically  just  one 
classification  procedure,  namely  the  weighted  non-par ametric  index  (WNI) 
method  which  has  been  described  in  our  previous  reports. 

This  procedure  which  is  simple  and  straight  forward  has  certain 
limitations.  First,  it  is  applicable  to  only  two  groups  of  variables. 

Although  when  WNI  (1;  is  plottec  vs.  V.'NI  (?)  one  csni  obtain  some  seccncsry 
clustering,  indicating  sub-group incs,  but  the  sopsra-'ticn  between  ttese  is 
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by  variables  with  large  variances  from  their  respective  group  averages.  This 
is  especially  true  when  the  difference  between  the  group  average  is  relatively 
small.  Third,  the  diagnostic  referent  point  on  the  D  =  WNl  (1)  -  WNI  (2) 
scale  is  arbitrary,  which  becomes  problematic  if  the  D  values  for  mentiers  of 
the  two  groups  form  a  progressive  continuum  without  a  significant  gap  between 
D's  of  the  two  groups.  Fourth,  this  "non-parametric”  procedure  does  not 
provide  us  with  a  measure  of  probability  of  a  given  case  belonging  to  each 
group,  thus  it  is  lacking  a  quantitative  measure  for  the  diagnostic  assignment 
of  a  given  case  to  a  particular  group. 

In  view  of  these  limitations,  we  have  experimented  with  two  other 
classification  procedures  -  the  clustering  analysis  procedure  (P2M  procedure 
BMDP  UCLA,  1977)  and  the  stepwise  discriminant  analysis  procedure  (P7M  BMDP 
UCLA,  1977).  The  former  classification  is  free  of  the  bias  of  assignment  of 
cases  to  a  particular  number  of  groups,  whereas  the  latter  can  handle 
efficiently  a  large  number  of  pre-specificed  groups  and  it  provides  the 
probability  of  each  case  belonging  to  any  of  the  groups  in  question. 

Cl.  The  Clustering  Analysis 

In  this  procedure  one  represents  each  of  the  cases  (each  with  n  variables) 
as  a  point  on  a  surface  of  a  n  dimensional  space.  This  is  done  by  calculating 
a  n  dimensional  vector  as  a  resultant  of  the  values  of  the  variables  measurec 
on  n  orthogonal  coordinates.  If  we  have  many  cases,  each  constituting  a  point 
on  the  n  dimensional  surface,  we  can  calculate  the  distance  on  this  surface 
between  any  given  two  points.  The  clustering  procedure  selects  the  two  points 
with  the  shortest  n  dimensional  Euclidian  distance  between  them,  producing  a 
cluster  of  two.  Then  the  program  tries  to  find  among  all  the  remaining  points 
a  (third)  point  (case)  closest  to  the  first  two  points,  forming  a  cl'jstnr  of 
■.  A^'In  the  pronram  ;.sv (‘rerth)  point  ‘o  '■ 


cluster  of  three,  and  so  on.  Vfrien  the  distance  between  the  growing  cluster 
and  the  next  point  is  larger  than  between  a  pair  of  the  remaining  points,  a 
new  cluster  of  two  is  selected,  which  can  again  grow,  aggregating  points  in 
its  vicinity.  This  process  continues  until  the  nearest  distance  remaining  is 
the  distance  between  the  boundaries  of  two  clusters,  which  are  then  registered 
as  a  cluster  of  clusters.  The  classification  is  ended  when  all  points  are 
accounted  for,  when  a  master  cluster  containing  all  the  points  (all  the  cases) 
is  registered.  This  approach  is  completely  bias-free  as  far  as  the  number  of 
groups  (clusters)  it  will  form  from  the  set  of  cases;  only  after  clustering 
can  one  check  the  a  priori  assignment  of  a  given  case  against  the  cluster  it 
ended  up  in.  The  program  also  allows  the  identification  of  the  relative 
positions  of  points  (cases)  within  clusters.  On  the  other  hand  the  procedure 
does  not  test  the  variables  for  their  variances  from  a  group  average  (which  is 
done  in  the  WNI  and  in  the  discriminant  analysis  classifications)  or  for 
co-variance,  which  is  performed  in  the  discriminant  analysis. 

C2.  The  Stepwise  Discriminant  Analysis 

This  classification  procedure  separates  the  cases  into  a  prespecified 
number  of  groups  after  analyzing  the  variance  of  each  variable.  This 
procedure  also  selects  those  variables  which  separate  the  cases  into  the 
specified  groups  most  effectively. 

The  procedure  first  determines  the  variance  of  each  variable  within  each 

group  and  compares  it  to  the  variances  between  groups.  The  comparison  is  done 

by  calculating  F  values,  i.e.  dividing  the  square  of  intergroup  variance 

(Sg)  by  the  square  of  the  variances  (S^)  of  the  individual  variables 

2  2 

arouno  the  corresponding  group  average:  F  =  The  program 

then  selects  the  variable  with  the  highest  F  value  and  if  this  is  larger  than 
a  pre-specif  ieo  thrc'.:  old  "F-to  enter"  it  will  use  this  varietle  t~ 
the  cases  into  groups. 
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In  the  second  step  it  will  test  all  remaining  variables  for  their  ability 
to  separate  between  groups  on  a  2  dimensional  surface,  comparing  again  the  new 
intergroup  variance  to  the  group  variances.  The  variable  with  the  highest  "F 
to  enter"  value  at  this  step  will  be  then  added  to  the  first  selected 
variable,  provided  its  F  value  exceeds  the  "F  to  enter"  threshold  and  provided 
that  a  correlation  coefficient  between  the  two  variables  is  not  above  a 
specified  limit.  A  high  correlation  would  obviously  invalidate  the  group 
classification  by  two  presumably  independent  variables. 

After  the  two  classifying  variables  are  selected  the  program  computes  for 
each  case  a  point  (vector)  on  a  3  dimensional  surface  using  each  of  the 
remaining  variables  combined  with  the  2  variables  selected  in  steps  1  and  2. 
The  variable  with  the  highest  F  value  is  selected,  provided  it  exceeds  the 
threshold  and  that  it  does  not  co-vary  with  the  combination  of  the  two  first 
variables.  If  it  does  not  fulfill  the  second  condition  the  variable  with  the 
next  highest  F  value  is  selected  and  tested  against  the  two  criteria.  The 
variable  selection  procedure  is  continued  until  all  remaining  variables  end  up 
with  F  values  below  the  threshold  or  exhibit  excessive  covariance  with  the  set 
of  selected  variables. 

Following  each  step  the  program  also  re-checks  the  F  values  of  all  the 
variables  selected  for  classification  up  to  that  step,  since  their  F  values 
will  change  with  each  added  variable.  If  any  of  the  previously  selected 
variables  has  an  F  value  lower  than  a  given  threshold  ("F  to  remove")  it  will 
be  removed  from  the  classification  set,  and  the  procedure  is  repeated  again 
for  all  the  remaining  non-classifying  variables  to  select  a  new  one  with  an 
acceptable  F  value  and  covariance  coefficient.  The  variables  selected  at  each 
step  are  combined  to  form  a  linear,  optinired  classification  function 
(r.-ci:!. jnsl-cnaJ  veccor.  tr.at  ^  ■  ,  ■  -ati'. 
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Once  the  variable  selection  process  is  complete  using  say  m  variables,  the 
cases  are  classsified  into  groups  as  points  on  a  m  dimensional  surface  and  the 
intercase  distance  is  calculated.  From  these  the  centroids  or  group  averages 
are  computed  as  is  the  probability  of  assignment  of  each  case  to  any  given 
group.  The  grouping  can  be  graj^ically  presented  two-dimens ionally  as  points 
indicating  the  groups  centroids  or  as  clusters  of  points  (representing  the 
individual  cases)  around  the  centroids. 

D.  A  Comparative  Evaluation  of  the  Statistical  Classification  Procedures. 

As  stated  above  each  of  the  3  classification  procedures  tested  by  us  has 
more  desirable  and  less  desirable  features  when  compared  to  the  others.  To 
illustrate  this  we  applied  the  3  procedures  to  the  same  set  of  data,  namely 
analysis  39  spectra  from  urine  of  14  patients  with  alcoholic  liver  disease 
compared  with  26  spectra  from  13  healthy  adults. 

The  results  of  the  Wilcoxon  test  on  100  peaks  of  lowest  p  values  is  given 
in  Figure  10.  The  WNI  values  of  the  65  cases  for  the  two  groups  using  1/p 
weighting  as  well  as  the  values  of  D  =  WNI  (1)  -  WNI  (2)  are  given  in  the  same 
figure.  One  can  see  here  that  all  the  D  values  of  the  pathological  samples 
(LV)  are  negative  and  smaller  than  -35  whereas  those  of  all  the  controls  (CP) 
are  positive,  with  the  exception  of  case  CP22  which  is  negative  (-9.5)  but 
still  significantly  larger  than  any  of  the  LV's.  In  other  words  this 
classification  did  not  show  any  overlap  between  the  tested  groups.  A 
computerized  graphical  presentation  of  the  WNI  data  is  presented  in  Figure  11 
where  each  case  is  presented  by  its  WNI  (1)  and  WNI  (2)  values.  We  see  here 
the  clustering  of  the  two  groups  with  a  distinct  region  of  demarcation  betv.t-en 
them.  Although  WNI  (1)  =  WNI  (2)  for  a  case  would  indicate  that  it  equally 
beicncs  to  the  two  crcoos  this  is  not  necessarily  true  when  appliec 
-statistically  to  two  croups  where  at  least  one  c’uster  has  D  values  very 
uiffeicnt  t'r&in  7erc. 
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The  clustering  analysis  applied  to  the  same  cases  presented  in  Figure  12. 
Here  cases  1  to  23  are  of  liver  patients,  and  cases  24-48  were  of  controls. 

We  see  here  that  clustering  began  with  case  17,  13  and  6  followed  by  cases  19 
and  11.  By  the  end  of  the  clustering  process  all  samples  1  to  24  (the 

pathological  cases)  were  in  one  cluster  with  case  44  (of  the  controls)  being 

• 

the  next  one  added  to  this  cluster  (but  only  in  step  #41  in  the  amalgamation 
order).  All  the  other  controls  are  again  in  a  distinct  different  cluster. 

One  may  also  distinguish  some  sub-clusters  like  cases  1,  2,  10  and  11  or  15, 

21,  17,  16,  and  20  among  the  pathological  samples.  Since  this  program  does 
not  presume  any  predetermined  groups  for  classification  these  subgroups  may 
have  some  biochemical  features  in  common  in  addition  to  being  part  of  a  liver 
disease  population. 

Figure  13  presents  part  of  the  computer  output  of  the  Euclidian  distances 
between  the  points  representing  each  case  on  an  n  dimensional  surface.  (The 
distance  between  48  points  to  just  15  points  of  other  cases  are  shown  in  this 
figure).  These  distances  are  then  amalgamated  or  clustered  as  discussed  above. 

Figure  14  presents  the  same  variables  and  their  respective  intragroup 
averages  used  for  the  subsequent  stepwise  discriminant  analysis. 

Figure  15  shows  the  first  two  steps  of  selection  of  the  two  first 
variables  for  classification,  namely  mass  peaks  197  and  95  respectively.  The 
same  figure  then  describes  step  #15  of  the  procedure  by  which  15  peaks  were 
selected  by  the  "F-to-enter"  and  covariance  criteria. 

Figure  16  presents  a  later  stage  in  the  procedure  -  step  #40,  when  just  30 
variable  (peaks)  were  selected  a's  parameters  in  the  classification,  indicating 
that  10  variables  originally  selected  were  subseauently  rejected  due  to  F 
values  below  the  ("F-to-remove")  threshold. 
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FIGURE  16 
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Figure  17  presents  the  distances  from  each  group  centroid  to  the  points 
representing  each  of  the  cases  calculated  from  the  30  variables  finally 
selected.  This  table  gives  also  the  probability  of  assicFiment  of  each  case  to 
a  given  group.  One  can  see  that  each  case  in  the  L  group  have  been  assigned 
to  it  with  a  probability  of  unity  and  similarly  each  case  in  the  C  group 
(controls)  has  been  assigned  to  this  group  with  a  probability  of  unity. 

Figure  18  is  a  graphical  presentation  of  the  same  data  recalculated  for  2 
dimensional  projection.  The  digits  1  and  2  are  the  positions  of  the  centroids 
of  each  group  respectively.  The  resolution  of  the  computerized  printout  is 
limited  so  that  if  more  than  one  case  fall  on  the  same  overall  unit  area  on 
the  plot  they  will  be  presented  by  just  a  single  mark.  Therefore,  only  19 
points  are  shown  for  the  pathological  sample  and  coincidentally  19  points  were 
printed  for  the  controls.  The  actual  coordinates  for  each  of  the  48  cases  is 
presented  in  Figure  17. 

We  see  that  the  stepwise  discriminant  analysis  has  separated  the  cases  in 
a  more  "decisive”  manner  than  by  the  WNI  test  (Figure  11).  This  is  not 
surprising  in  view  of  the  fact  that  it  was  given  the  "best"  preselected  50 
variables  and  its  own  procedure  used  only  30  out  of  these  for  the  ultimate 
classification.  Although  the  quantitative  and  graphical  presentations  by  this 
procedure  are  the  most  "convincing"  among  the  3  types  of  classification,  it  is 
the  clustering  analysis  that  demonstrated  in  an  utterly  unbiased  fashion  that 
we  had  in  this  study  just  two  biochemically  distinct  groups.  As  stated 
elsewhere  in  this  report  we  shall  proceed  with  additional  comparative 
statistical  analyses  of  other  series  of  oiological  samples,  including  sets 
less  distinctly  rJiffer-'nt  than  the  one  shov-n  here,  before  cecidinc  which 
statistical  treatment  is  most  suitaole  for  a  given  proDlem. 
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4.  E)(PERI^€NTAL  RESULTS 
A.  Hospital  Patient  Studies 

The  entire  multicomponent  mixture  analysis  procedure  has  been  tested  using 
four  sets  of  pathological  and  three  sets  of  control  samples.  Two  sets  of 
samples  representing  liver  malfunction  have  been  obtained.  One  set  of  eight 
satiples  Includes  a  variety  of  diseases,  including  malignancies,  with  liver 
involvement  as  a  primary  or  secondary  reason  for  hospitalization.  Another  set 
of  12  samples  is  more  homogeneous  representing  primarily  alcoholic  hepatitis. 

A  sample  set  from  7  individuals  hospitalized  in  the  same  ward  as  the  alcoholic 
hepatitis  patients  was  evaluated  as  a  control  set. 

Samples  were  provided  by  the  staff  of  Childrens  Hospital  from  7  pneumonia 
cases,  12  diarrhea  patients  diagnosed  as  of  viral  origin,  and  6  samples 
selected  from  patients  hospitalized  for  concussion  and  tonsilectomy, 
situations  unlikely  to  involve  an  infectious  organism.  An  additional  set  of 
13  adult  control  samples  was  obtained  from  healthy  volunteers  within  the 
university. 

Each  sample  was  analyzed  in  duplicate  and  control  and  pathological  samples 
were  interspersed.  Summed  spectra  were  assembled  into  data  sets  for 
processing  in  the  CYBER  by  either  the  Wilcoxon  WNI  programs  or  the  BMDP 
programs. 

The  results  of  the  Wilcoxon- WNI  program  are  presented  for  several  data 
sets  in  figures  19  through  25.  The  graphs  display  WNI(l)  versus  WNI (2)  for 
each  sample  in  both  groups.  The  WNI(l)  axis  represents  the  difference  of  a 
given  sample  from  the  average  spectra  of  the  pathological  group  while  the 
WNI(2)  axis  shows  the  oifference  of  a  spectra  from  the  control  group  class 
average.  In  this  representation  control  sa.r.ples  should  fall  close  to  the 
vertical  axis  and  a  ul'-tar.i-.  h.- ■  izontai ,  wriile  pat'  : : : 

samples  should  fall  roar  the  tiorirertal  axis  arc  a^ay  from  the  vertical  axle. 
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ANBAR.  MICHAEL 
353-30>1453 


FIG.  21  CHILDREN'S  CONTROL  SAMPLES  (CC-CLOSED  CIRCLES)  VS 

CHILDREN'S  PNEUMONIA  SAMPLES  USING  14  m/e  VALUES  FOR 
WNI'S  (PC-CPEN  CIRCLES). 
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FIG.  23  ADULT  CONTROL  SAMPLES  (CA  CLOSED  CIRCLES)  VS  MIXED  LIVER 
DISORDERS  (HA-OPCN  CIRCLES)  USING  ALL  tn/e  VALUES. 
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The  comparison  of  the  diarrhea  In  children  with  controls  (Fig.  19)  shows 
two  control  samples  falling  within  ttie  pathological  group  when  all  masses  are 
used  with  a  weighting  function  of  1/p  to  compute  the  WNI's.  The  use  of  only 
the  eight  mass  of  lowest  p-value  (Fig.  20)  not  only  shows  complete  separation 
of  control  and  diarrhea  cases,  but  seems  to  reveal  a  distinct  grouping  among 
the  diarrhea  cases  that  may  be  the  result  of  a  different  infecting  organism. 
For  several  of  the  diarrhea  sarrples  collected  within  a  few  days  of  each  other 
rhodovivus  was  detected  by  independent  tests. 

The  pneumonia  cases  show  one  individual  from  the  pathological  set  falling 
within  the  control  group  using  WNIs  based  on  the  14  peaks  of  lowest  p-value 
(Fig.  21).  The  hospital  records  for  this  individual  indicated  that  this 
prtient  was  discharged  the  day  after  the  sample  was  taken,  so  it  might  be 
surmised  that  this  patient  had  already  recovered  from  the  infection.  Adding 
this  individual  to  the  control  group  and  removing  the  sample  pair  from  the 
pathological  group  did  not  appreciably  change  the  cluster4.ng  of  samples 
obtained  before,  indicating  that  this  sample  properly  belongs  in  the  control 
group. 

An  example  of  differential  diagnosis  is  given  in  Fig.  22  comparing  the 
diarrhea  to  the  pneumonia  cases.  Incomplete  separation  was  obtained  using  all 
masses  weighted  as  1/p  in  the  WNI  calculation. 

Figures  23  and  24  show  the  mixed  set  of  liver  disorders  compared  to 
healthy  adult  controls,  using  all  masses  weighted  by  1/p  and  using  only  the  12 
masses  of  lowest  p-value.  Again  the  use  of  only  the  diagnostic  peaks  improves 
the  separation  between  the  two  classes.  Comparing  the  alcoholic  hepatitis  set 
with  I  ospital  controls  (Fig.  25)  reveals  a  weakness  in  the  choice  of  this 
control  set,  so  that  although  the  alcoholic  hepatitis  samples  are  tightly 
clustered,  a  numtcr  r<  les  from  the  co- '.’'cl  set  fall  within  this 
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grouping.  In  this  experiment  patients  with  gastric  ulcer,  functional  gall 
bladder  disease,  and  anemia  appear  to  have  similar  profile  patterns  to  those 
with  liver  disorders.  This  example  Illustrates  the  importance  of  obtaining 
well  characterized  samples  upon  which  to  base  the  initial  pattern  analysis. 
Comparing  the  alcoholic  hepatitis  set  with  the  adult  controls  (Fig.  11)  shows 
a  much  clearer  separation.  In  both  comparisons  the  set  of  pathological 
samples  appears  to  be  a  well  defined  homogenous  group.  The  use  of  the  adult 
control  set  in  the  comparison  generates  a  different  set  of  masses  with  the 
lowest  p-values.  In  addition  the  p-values  were  2  to  3  orders  of  magnitude 
smaller  using  the  adult  controls.  Tabular  results  of  the  Wilcoxon-WNI 
analysis  of  these  samples  is  presented  in  Figure  10. 

These  two  groups  of  samples  were  also  analyzed  using  the  BMDP  programs. 
First  the  BMDP  30  program  was  used  to  calculate  the  t-statistic,  separate  and 
pooled,  for  the  peak  intensities  at  each  m/e  value  for  the  null  hypothesis  of 
equivalent  means  for  both  the  pathological  and  control  groups.  Based  on  these 
results  51  masses  were  selected.  A  significant  number  of  the  masses  selected 
by  this  method  also  show  low  p-values  by  the  Wilcoxon  test.  Using  these 
selected  variables  both  cluster  and  stepwise  discriminant  analysis  programs 
were  used  to  classify  the  individual  cases.  These  results  have  been  described 
in  section  30. 

B.  Longitudinal  Studies  on  Virus  Infected  Patients. 

Controlled  longitudinal  studies  were  carried  out  on  two  sets  of  volunteers 
at  the  AFRIID.  One  group  consisted  of  seven  individuals  who  received  live 
virus  vaccine  for  sandfly  fever.  Two  additional  individuals  in  this  group 
received  a  placebo  injecticn.  None  of  the  participants  v/ere  told  v;hether  ti'oy 
received  the  vaccine  or  control  injecticn.  N'crning  urine  samples  were 
'■;.  l  J  f  :>:  .  i  .^rior  tc  tr.c  injection,  the  cay  of  tic 
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injection,  for  8  consecutive  days  subsequent  to  injection  and  finally  28  days 
after  the  injection. 

A  second  volunteer  group  was  innoculated  with  Dengue  fever  live  virus 
vaccine  provided  morning  urine  samples  3  days  prior  to  and  for  21  days 
following  injection.  All  samples  were  stored  frozen  without  preservative, 
shipped  to  this  laboratory  packed  in  dry  ice,  and  kept  frozen  until  analysis. 
We  initially  analyzed  samples  from  three  individuals  in  the  sandfly  fever 
experiment  and  two  participants  of  the  Dengue  fever  group  without  any  prior 
knowledge  of  the  sample  classification  (vaccine  vs.  control)  or  the  expected 
time,  and  duration  of  the  symptoms.  The  samples  from  each  individual  were 
prepared  and  analyzed  in  duplicate  in  a  random  sequence.  In  a  few  cases 
during  this  series  a  sanple  had  to  be  repeated  due  to  instrument  malfunction 
during  data  acquisition.  In  these  cases  we  found  that  better  replicate 
samples  were  obtined  by  using  the  salt  saturated  urine  remaining  from  the 
previous  extraction  rather  than  using  the  original  refrozen  sample.  We 
suspect  that  a  bacterial  contamination  may  have  affected  the  original  samples 
during  their  exposure  to  room  temperature.  We  do  not  observe  similar 
differences  in  repeated  analyses  of  samples  that  are  collected  and  stored  with 
ZnSO^  as  a  preservative. 

The  data  from  the  first  three  individuals  of  the  sandfly  fever  showed  an 
extreme  difference  in  the  patterns  of  two  of  the  individuals  (Joffe,  LeBlanc) 
compared  to  the  third  (Berry). 

Throughout  the  sampling  period  Berry's  profiles  show  significantly  lower 
intensity  over  the  entire  mass  range.  This  individual  also  apparently  did  not 
consume  any  caffeinated  beverages  during  the  study,  leading  to  a  further 
qualitative  difference  in  this  person's  pattern  compared  to  the  other  two 


particifKiots. 


therefore,  oic  r-r.t  inclur;e  Eerry's  patterns  in  t^ie  i'-itie'. 
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analysis  of  the  data.  (We  have  received  the  information  that  this  individual 
served  as  one  of  the  controls  after  the  analysis  of  all  samples  were 
completed).  With  only  two  individuals  to  examine  we  were  concerned  that 
dietary  variations  plus  Individual  variations  in  the  response  to  the  vaccine 
might  obscure  the  detection  of  the  pathological  pattern  or  location  of  the 
maximum  response  period.  Two  approaches  were  partially  successful  in  handling 
this  problem  of  deciphering  an  urrf<nown  pattern  appearing  at  an  unknown  time  in 
a  small  data  set. 

In  the  first  case  the  BhCP  70  program  for  variable  stratification  was 
employed  to  show  the  means  and  intensity  distributions  for  each  m/e  value  as  a 
function  of  the  sample  collection  sequence.  An  example  is  shown  in  Figure 
26.  This  analysis,  although  tedious  and  inefficient  for  routine  use,  did 
reveal  a  number  of  masses  showing  changes  in  intensity  distributions  at  days 
3,  4  and  5.  A  second  approach  involved  comparing  patterns  of  days  minus  6  and 
zero  against  days  3,  4,  and  5. 

Selection  of  days  3-5  was  partly  based  on  the  previous  stratification  of 
variable  study  and  also  on  the  expected  time  for  development  of  symptoms  for 
the  particular  virus  employed.  Again  BMDP  3D  was  used  for  a  t-test  on  each 
m/e  value  between  the  selected  groups  of  days.  Using  the  variables  selected 
in  this  manner  the  BMDP  7M  discriminant  analysis  program  was  used  to  derive 
the  classification  function  for  the  same  two  groups  (days  -4,  0  vs.  3,  4,  5). 
In  addition  this  function  was  used  to  classify  days  2  as  a  third  group  and 
days  7,  8  and  28  as  a  fourth  group. 

Figure  27  shows  the  results  of  this  classification.  The  most  obvicus 
result  to  be  noted  is  that  the  "normal"  group  (cay  -4  and  C)  separates 
completely  from  the  presumea  patholcgical  croup  (days  3-5).  Of  more  irterssr 
is  the  classitlsuti :n  s;  ::.e  srouo  Pssoc  c-  'ssy  2  sncvdog  a  paroolrsls' 


-  67  - 


pattern  for  Joffe  and  a  normal  pattern  for  LaBlanc.  In  the  final  group,  day 
28  shows  a  return  to  the  normal  pattern  for  both  individuals  with  days  7  and  8 
showing  predominantly  a  pathological  pattern. 

We  have  been  informed  by  AFRIIO  that  neither  subject  developed  any 
clinical  symptoms  before  the  middle  of  Day  3,  thus  since  these  were  morning 
urines  we  have  three  urine  samples  (2  of  Joffe  and  1  of  LeBlanc)  that  showed 
the  pathological  pattern  prior  to  manifestation  of  clinical  symptoms.  Next  we 
are  told  that  by  day  7  both  subjects  were  feeling  well,  free  of  symptoms.  But 
still  on  day  8  LeBlanc  shows  an  unambiguous  pathological  pattern,  whereas 
Joffe  shows  an  ambiguous  result.  Unfortunately  in  this  series  no  samples  were 
collected  beyond  day  8  so  that  it  is  impossible  to  determine  at  this  point  the 
time  of  unambiguous  disappearance  of  the  pathological  pattern.  In  any  case  it 
seems  to  appear  prior  to  the  clinical  symptoms  and  persist  beyond  the  time  of 
their  disappearance. 

The  samples  from  the  remaining  participants  in  this  study  were  recently 
shipped  to  us  and  days  -4,  0,  3,  4  and  5  from  persons  receiving  the  vaccine 
were  selected  for  initial  analysis.  During  the  preparation  of  this  set  the 
addition  of  EDTA  was  inadvertently  omitted  from  the  extraction  procedure.  Our 
statistical  analysis  of  this  sample  set  shows  a  detectable  pattern  difference 
due  to  this  change  in  the  procedure  making  a  direct  classification  of  all 
seven  vaccinated  individuals  more  complicated.  The  use  of  the  t-test 
comparing  normal  and  "pathological"  periods  of  this  last  set  of  samples  leads 
to  a  selection  of  variables  that  correctly  classifies  these  samples  using  the 
discriminant  analysis  program  (see  Figure  28).  The  use  of  these  varieties 
with  the  discriminant  analysis  program  to  classify  samples  from  all  seven 
individuals  into  four  groups  was  also  successful;  still  we  plan  to  recest  tre 


eriflyeis  of  the  first 
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A  i  jiout  using  EDTA  in  the  ex 


procedure  to  eliminate  this  this  artifact  from  the  classification. 
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The  more  interesting  question  is,  however,  whether  the  5  additional 
subjects  will  exhibit  a  similar  behavior  to  the  first  2  subjects  on  day  2  and 
on  days  7  and  8  and  whether  no  effect  will  be  found  in  the  placebo  cases. 

These  results  will  be  reported  in  next  year's  report. 

C.  Study  of  Human  Tissue  Culture  Infected  With  Polio  Virus. 

This  study  was  undertaken  to  determine  the  differences  between  infected 
and  uninfected  cell  cultures  and  to  see  how  soon  after  infection  of  the 
culture  such  differences  can  be  detected.  We  have  begun  this  study  with  a 
cell  line  of  human  enbryonic  lung  and  the  Mahoney  strain  of  polio  virus.  The 
tissue  culture  work  was  done  by  Or.  Howard  Faden  and  his  associates  at  the 
Children's  Hospital  in  Buffalo. 

The  human  embryonic  lung  culture  was  originally  established  by  Dr.  Fishaut 
of  Denver  by  7  to  8  passes  of  tissue  obtained  from  an  aborted  fetus.  These 
cells  were  grown  in  Eagles  minimal  essential  growth  medium  which  was 
supplemented  with  5%  newborn  calf  serum.  Cultures  with  equal  numbers  of  cells 
were  prepared  in  5  ml  plastic  culture  tubes  that  were  incubated  at  37°C  for 
zero,  6,  and  24  hours.  For  each  of  these  three  time  points,  nine  tubes  were 
prepared.  Three  tubes  were  left  uninfected,  three  were  infected  with  1/10  ml 
of  viral  solution  with  a  tissue  culture  infective  dosage  (TCID^p)  of  1  x 
10^,  and  three  tubes  with  a  TCID  of  1  x  10^. 

These  titers  of  virus  were  allowed  to  adsorb  onto  the  cells  for  one  hour. 
At  that  time  the  cells  were  washed  with  isotonic  phosphate  buffer  solution  to 
remove  excess  virus  particles.  'Following  the  wash,  medium  with  newborn  calf 
serum  wan  added  to  the  tubes,  and  the  cells  were  incubated  for  the  times 
indicated.  When  the  incubation  times  were  reached,  the  tubes  were 
refrigerateo  and  centrifuged  to  senerate  the  cells.  We  received  the  chilled, 
c‘:i^  lice,  Sajgerr.u.c  ef  tnin  centri fugation. 


To  one  ml  of  this  medium  we  added  .25  ml  of  cold  70X  perchloric  acid  and 


.25  ml  of  cold  11  M  KOH  to  precipitate  proteins  of  the  medium.  To  Insure 
inactivation  of  the  virus,  20  mlcrollters  of  .3M  H0OI2  was  added  to  each 
tube.  The  supernate  of  each  tube  was  removed  and  diluted  with  an  equal  volume 
of  distilled  to  facilitate  pH  adjustment.  This  diluted  medium  was  then 
titrated  with  KOH  and  HCIO^  to  a  pH  of  2,  7,  or  10.  A  150  microliter  sample 
was  removed  at  each  pH  and  placed  in  a  1  cm  by  .3  cm  glass  culture  tube  which 
contained  a  folded  3  cm  x  1  mm  strip  of  Whatman  fiberglass  paper  GP/C.  The 
san¥)les  were  evaporated  onto  this  paper  by  blowing  warm,  dry  nitrogen  over 
them.  The  paper  with  the  sample  dried  onto  it,  was  then  introduced  into  the 
mass  spectrometrer. 

Initially  high  and  zero  virus  samples  Incubated  for  zero  and  24  hours  from 
each  of  the  3  tubes  were  prepared  at  pH  2,  7,  and  10  in  duplicate.  The  data 
were  initially  examined  using  the  t-test  to  find  peaks  showing  differences  at 
24  hours  between  zero  and  high  virus.  Using  masses  selected  in  this  manner, 
cluster  analysis  and  discriminant  analysis)  programs  were  used  to  classify  the 
four  groups  consisting  of  24  hours  hi^  and  zero  virus  and  0  hours  high  and 
zero  virus.  A  superior  separation  among  these  groups  was  observed  for  the 
samples  prepared  at  pH  2  (see  Figure  29  through  31).  Based  on  this  analysis 
pH  2  was  selected  for  preparation  of  high  and  low  virus  samples  from  6  hours 
and  24  hours. 

The  initial  classification  into  four  groups  revealed  several  trends  in  the 
patterns.  First  there  was  a  small  but  consistent  difference  in  the  zero  time 
patterns  of  different  virus  concentrations  indicating  a  change  in  the 
composition  of  the  media  due  to  the  acaing  of  virus.  This  is  an  artifact  net 
normally  considered  in  cell  culture  expericents  and  it  can  be  eliminated  in 


future  studies  by  neking  all  virus  imoculations  including  t(ie  zero  virus 
using  a  common  media.  Second,  the  incubation  time  and  cell  growth  leads  to 
time  dependent  changes  in  the  int^sity  of  selected  m/e  values.  We  observe 
both  increases,  which  may  be  due  to  cell  metabolites  excreted  into  the  media, 
and  decreases  possibly  reflecting  depletion  of  nutrients  in  the  media.  The 
third  detectable  trend  is  for  the  virus  infected  tubes  to  show  similar  time 
dependent  trends  but  of  reduced  magnitude  consistent  with  an  inhibition  in  the 
effective  cell  growth  rate.  That  is,  the  same  peaks  which  increase  or 
decrease  in  uninfected  cultures  show  a  smaller  increase  or  decrease 
respectivly  in  the  infected  cultures.  Finally  and  more  interesting  there  are 
a  limited  number  of  m/e's  showing  intensity  changes  that  may  specifically 
reflect  virus  activity.  These  Include  cases  where  virus  infected  cultures 
show  larger  changes  in  the  same  direction  observed  in  non-infected  tubes. 

Two  additional  t-tests  were  used  to  obtain  new  groups  of  m/e  values.  By 
performing  the  t-test  on  zero  versus  24  hours,  at  high  virus  concentration,  a 
set  of  peaks  was  obtained  reflecting  both  time  and  virus  dependent  pattern 
changes. 

Similarly  a  set  of  zero  time  media  composition  dependent  peaks  was  found 
using  the  t-test  to  compare  high  and  zero  virus  at  zero  time.  Three  new  sets 
of  variables  were  generated  from  the  initial  42  variables  selected  using  the 
t-test  comparison  between  zero  and  24  hours.  For  each  set  of  variables  the 
discriminant  analysis  program  was  used  to  develop  a  classification  function 
for  five  groups  consisting  of  the  following: 
group  1:  all  zero  hour  samples 
group  2:  higti  virus,  6  hrs 

group  3:  high  virus,  24  hrs 

group  zero  virus,  6  hr-i 

group  5:  zero  virus, 


24  hours. 
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A  set  of  27  masses  was  selected  from  the  above  42  by  removing  15  masses 
also  appearing  in  the  zero  hour  test  group. 

These  results  are  summarized  in  the  canonical  variable  plots  of  the  group 
average  locations  in  Figures  32  through  34.  Ihe  separation  obtained  with  27 
variables  (Figure  32)  was  increased  when  two  additional  zero  time  dependent 
masses  were  removed  as  shown  in  Fi^re  33.  Reducing  these  25  variable  to  21 
by  eliminating  4  more  peaks  in  common  with  the  zero  time  difference  leads  to 
reduced  separation  as  shown  in  Fi^re  34.  Thus  it  is  not  possible  to 
eliminate  all  zero  time  artifacts  without  losing  some  of  the  time  and  virus 
specific  information  contained  in  the  variables  removed  in  the  third  test. 
This  compromise,  however,  would  not  exist  in  future  experiments  designed  to 
avoid  initial  differences  in  media  composition. 

These  preliminary  results  on  the  •'metabolic  profile"  of  tissue  culture 
media  indicates  that  the  virus  infection  can  be  detected  in  vitro  6  hours 
following  exposure.  We  do  not  know  as  yet  whether  this  is  the  shortest  time 
for  a  significant  virus  induced  change  to  be  demonstratable  by  our  technique, 
nor  do  we  know  whether  this  effect  of  the  virus  is  specific  to  a  given  virus 
species.  Further  study  in  the  coming  year  will  clarify  these  points. 
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4.  PROPOSED  PROGRAM  CF  RESEARCH  FOR  1980/81 

Here  Is  our  proposed  program  of  research  for  the  coning  year,  as  well  as 
an  outline  of  some  of  the  fallow>on  tasks. 

A.  Longitudinal  Studies  on  Human  Subjects  with  Infectious  Disease. 

We  plan  to  continue  to  analyze  additional  samples  of  urine  to  be  obtained 
from  ATRIIO  from  longitudinal  studies  on  volunteers  infected  with  Dengue 
fever.  These  experiments  together  with  the  ongoing  tests  on  urine  from  a 
longitudinal  study  on  sandfly  fever  will  give  us  a  better  base  to  evaluate  the 
scope  of  our  diagnostic  method.  In  addition  we  plan  to  collect  urine  samples 
from  patients  in  Buffalo  hospitals  suffering  from  a  number  of  well  diagnosed 
infectious  disease,  primary  viral  infections.  Since  we  found  it  rather 
difficult  to  obtain  well  documented  samples  from  the  same  patient  during  the 
course  of  a  disease,  we  plan  to  recruit  medical  stud^ts  as  project  members  to 
overcome  this  difficulty.  We  also  plan  to  test  and  possibly  implement  a 
simpler  and  more  reproducible  sample  preparation  technique  for  urine  samples. 
This  technique  may  cut  down  by  30%  the  overall  man-hour  time  requirement  per 
analysis. 

B.  The  Study  of  Tissue  CXilture  Media  as  a  Method  for  Early  Detection  of 

Viruses 

The  preliminary  positive  findings  described  in  the  "Technical  Report" 

(Part  III)  encourage  us  to  devote  to  this  aspect  of  the  program  a  major 
effort.  We  plan  to  continue  the  experiments  on  polio  virus  in  collaboration 
with  Dr.  Howard  Faden  of  the  Buffalo  Children's  Hospital.  We  plan  to  extend 
this  study  to  different  strains  of  polio  virus  and  then  to  other  viruses.  In 
collaboration  with  Dr.  Nathan  Woodruff  of  AFRIID  we  plan  to  analyze  an 
extensive  matrix  of  culture  media  ccrrprisirg  Different  viruses  and  different 
cell  lines  sampled  at  different  times  curiro  the  first  day  fcli~^^i^g 
infection.  The  simple  sairoie  sterilize:,  lion  sne  preparation  tecrnic-  '■-s 
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developed  this  year  will  be  used,  and  attempts  will  be  made  to  modify  the  mass 
spectrometric  analysis  procedure  to  double  the  throu^iput  and  perform  2 
analyses  per  hour.  We  hope  also  to  extend  the  same  procedure  and  Identify 
microorganisms  throu^  their  biod^mical  effects  on  bacterial  culture  media. 
Since  the  rapid  identification  of  viruses  is  more  unique  we  plan  to  give  it 
priority  over  bacterial  studies. 

C.  Continued  Evaluation  of  Statistical  Classification  Procedures. 


In  the  "Technical  Report"  we  show  that  we  made  substantial  progress  in  the 
use  of  different  classification  techniques.  In  the  coming  year  we  plan  to 
devote  to  this  aspect  a  full-time  graduate  student.  At  this  stage  of  the 
program  when  the  sample  preparation  and  the  mass  spectrometric  analysis  give 
satisfactory  results,  we  depend  on  the  computerized  statistical  analysis  to 
extract  as  much  information  as  practical  from  the  host  of  mass  spectrometric 
data.  Each  of  a  number  of  potential  classification  techniques  will  be  tested 
by  a  variety  of  challenges  -  such  as  deliberate  misassignment  of  samples 
(wrong  primary  diagnosis)  providing  the  program  with  deliberate  a  priori  false 
positive  and  false  negatives.  We  will  then  try  to  combine  classification 
techniques  to  a  sequence  most  appropriate  for  our  diagnostic  purpose.  For 
instance,  we  may  modify  the  P7M  stepwise  discriminant  analysis  program  to 
become  a  virtual  case  assignment  program  with  a  fixed  classification  library. 
We  may  also  modify  the  clustering  program  to  obtain  from  it  the  centroids  of 
individual  clusters  and  the  use  of  F  type  test  of  intergroup  vs.  intragroup 
variance  to  establish  the  diagnostic  separation  of  clusters.  If  the  results 
of  the  clustering  analysis  coulc3.be  visually  presented  in  a  quantitative 
manner,  it  would  becone  a  preferred  approach,  since  it  doss  not  imply  any  £ 
priori  grouping  of  the  samples  anaiyzea.  If  may  also  be  possible  to  use  the 
clustering  analysis  as  a  preliminary  croup  class! ficaticn  and  assicnrent 
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classification.  The  latter  procedure  seems  in  any  case  more  effective  in 
selection  of  the  most  significant  separating  variables,  and  therefore  it  would 
be  used  to  select  diagnostic  mass  peaks  for  structural  identification. 

Another  approach  may  involve  the  use  of  the  discriminant  analysis  for  the 
selection  of  a  diagnostic  set  of  peaks,  which  will  then  be  used  to  classify 
new  samples  by  the  spectrum  classification  program  existing  in  our  INCOS 
software  package,  as  described  in  last  year's  report.  We  also  plan  to  test 
the  correlation  between  the  non-parametric  Wllcoxon  test  and  the  parametric 
t-test  on  a  number  of  systems  before  making  a  final  decision  on  the 
suitability  of  either  procedure  for  preselection  of  peaks  for  classification 
of  cases. 

D.  Tentative  Program  for  Follow-On  Years. 

The  next  phases  of  research  will  include  the  following  tasks  in  addition 
to  those  described  above: 

1.  Identification  of  metabolites  found  to  be  characteristically 
associated  with  specific  infections,  by  the  use  of 
collisional-induced  fragmentation. 

2.  Exploration  by  FI-CID  of  characteristic  patterns  in  certain  classes 
of  metabolites,  e.g.,  carboxylic  acids,  primary  alcohols,  primary 
amines  identified  by  a  characteristic  fragment. 

3.  Application  of  multiconponent  analysis  to  the  characterization  of 
bacteria  -  a  topic  that  has  not  been  developed  further  since  the  very 
first  year  of  this  program  of  research. 

4.  Development  of  more  subtle  and  faster  statistical  analysis  technicues 
that  might  be  implemented  on  line  in  (practically)  "real  time"  on  cur 
dedicated  computer.  This  phase  of  the  research  program  is  essential 
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5.  Exploration  of  the  possibility  of  using  other  biological  fluids, 
primarily  plasma  or  saliva  for  rapid  diagnosis  of  disease.  The 
preparation  of  plasma  samples  for  mass  spectrometric  analysis,  which 
has  been  worked  out  during  last  year,  is  a  first  step  in  this 
direction.  The  examination  of  urine  of  patients  with  urinary 
Infections  and  of  pus  will  be  undertaken  to  characterize  these  fluids 
through  bacterial  metabolites. 


